From: Bill Dally [billd@csl.stanford.edu] Sent: Thursday, October 11, 2001 8:56 AM To: Bill Mark; Bill Dally Cc: Ian Buck; lance@leland.Stanford.EDU; ujk@leland.Stanford.EDU; Billmark@lambert.stanford.edu; pmattson@leland.Stanford.EDU; serebrin@stanford.edu; merez@stanford.edu; jowens@graphics.stanford.edu; horowitz@chroma.stanford.edu; hanrahan@graphics.stanford.edu Subject: Re: Brook - multiple input and output streams Bill, Three comments. First, by the very one-dimensional nature of a stream, the use of persistant state is one dimensional - early elements affect later elements but not vice-versa. Thus, the computation on persistant state can be converted to a log-depth parallel prefix computation - most of which takes place locally in one processing "element" - and only the top levels of the tree are between elements. Key to managing this type of communication - which we need anyway for reductions - like the "max" in Ian's example - is setting the proper traversal order. The traversal order wants to be in order of least expensive communication/synchronization. Roughly speaking the order is - across "clusters" in a clustered processor (MIMD or SIMD), across time in one processor, across "nearby" processors (e.g., on a chip or board), across distant processors. This order means that very little of the log tree structure crosses the expensive links. Whether a processor is SIMD or MIMD is immaterial to the cost of communication. What really matters is how tightly synchornized they are - not that they are all doing the same instruction. Processors that are synchronized can exchange some data directly - without buffering or waiting. There are lots of good technologies to synchronize MIMD processors (read our M-Machine papers) - SIMD processors are inherently synchronized. ----Bill At 11:46 PM 10/10/2001 -0700, Bill Mark wrote: >On Wed, 10 Oct 2001, Bill Dally wrote: > > >... > > The conditional and unconditional << and >> operators in KernelC are pretty > > useful. You should consider adopting this model in Brook - where the > > kernel is 'active' for the duration of the stream - rather than a single > > element - and can explicitly read from inputs and write to outputs. > > ... > >"Persistent kernels" are powerful, but allowing them does present >some challenges, especially from a programming-model point of view. > >What happens when you have more than one processing unit operating on >the stream? There are duplicate copies of the persistent state, >possibly with different values. On Imagine, you have to add explicit >communication between the processors to do something reasonable in >this situation. My impression is that this communication code is >often designed for a particular number of processors. It's not clear >to me that there's any simple way to hide the number of processors. > >If the language provided the "illusion" of one processor, the compiler >might be able to insert the appropriate communication, but I suspect >that it would then be very easy for the programmer to create code that >would run very slowly when the number of stream processors is high. >We should look at this question in more detail; my instinct may be >wrong here. > >Allowing communication between processing units also seems to require >that the processors be SIMD. SIMD's ability to simply perform >communication of this type is perhaps its greatest advantage over >MIMD. > >Bill M. > > > -------------------------------------------------------------------------- Bill Dally billd@csl.stanford.edu (650)725-8945 Professor of Electrical Engineering and Computer Science FAX(650)725-6949 Computer Systems Laboratory, Stanford University Gates Room 301 Stanford, CA 94305 http://csl.stanford.edu/~billd --------------------------------------------------------------------------